Summarization Techniques for Pattern Collections in Data Mining

نویسنده

  • Taneli Mielikäinen
چکیده

Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A major difficulty with patterns is that pattern collections summarizing the data well are often very large. In this dissertation we describe methods for summarizing pattern collections in order to make them also more understandable. More specifically, we focus on the following themes: Quality value simplifications. We study simplifications of pattern collections based on simplifying the quality values of the patterns. Especially, we study simplification by discretization. Pattern orderings. It is difficult to find a suitable trade-off between the accuracy of the representation and its size. As a solution to this problem, we suggest that patterns could be ordered in such a way that each prefix of the pattern ordering gives a good summary of the whole collection. Pattern chains and antichains. Virtually all pattern collections have natural underlying partial orders. We exploit the partial orders over pattern collections by clustering the patterns into chains and antichains. Change profiles. We describe how patterns can be related to each other by comparing how their quality values change with re-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Text Summarization and Discovery of Frames and Relationship from Natural Language Text - A R&D Methodology

The paper deals with the concept of data mining whereby the data resources can be fetched and accessed accordingly with reduced time complexity. Resource sharing is an important aspect in the field of information science. The retrieval techniques are pointed out based on the ideas of binary search tree, Gantt chart, text summarization. A theorem has been cited regarding the summation of total l...

متن کامل

Exploring Disease Association from the NHANES Data: Data Mining, Pattern Summarization, and Visual Analytics

Finding associations among different diseases is an important task in medical data mining. The NHANES data is a valuable source in exploring disease associations. However, existing studies analyzing the NHANES data focus on using statistical techniques to test a small number of hypotheses. This NHANES data has not been systematically explored for mining disease association patterns. In this reg...

متن کامل

Personal Video Manager: Managing and Mining Home Video Collections

Home video collections constitute an important source of content to be experienced within the digital entertainment context. To make such content easy to access and reuse, various video analysis technologies have been researched and developed to extract video assets for management tasks, including video shot/scene detection, keyframe extraction, and video skimming/summarization. However, one le...

متن کامل

CTMS: A Comparative Text Mining System

In many applications, there is often a need for comparing multiple text collections to find commonalities and differences in topical themes, a task we refer to as comparative text mining. In this paper, we present a general comparative mining system (CTMS). The CTMS system takes any two collections of text and generates a list of cross-collection themes and their associated individual collectio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0505071  شماره 

صفحات  -

تاریخ انتشار 2005